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ABSTRACT 

We present a simple primal-dual algorithm for computing approx¬ 
imate Nash-equilibria in two-person zero-sum sequential games 
with incomplete information and perfect recall (like Texas Hold'em 
Poker). Our algorithm is numerically stable, performs only basic 
iterations (i.e matvec multiplications, clipping, etc., and no calls 
to external first-order oracles, no matrix inversions, etc.), and is 
applicable to a broad class of two-person zero-sum games including 
simultaneous games and sequential games with incomplete informa¬ 
tion and perfect recall. The applicability to the latter kind of games 
is thanks to the sequence-form representation which allows us to 
encode any such game as a matrix game with convex polytopial 
strategy profiles. We prove that the number of iterations needed 
to produce a Nash-equilibrium with a given precision is inversely 
proportional to the precision. As proof-of-concept, we present 
experimental results on matrix games on simplexes and Kuhn Poker. 

Index Terms — Nash-equilibrium, sequential games, incom¬ 
plete information, perfect recall, convex optimization 

1 Introduction 

A game-theoretic approach to playing games strategically optimally 
consists of computing Nash-equilibria (in fact, approximations 
thereof) offline, and playing one’s part (an optimal behavioral strat¬ 
egy) of the equilibrium online. This technique is the driving-force 
behind solution concepts like CFR fflGlIE CFR+ a and other vari¬ 
ants, which have recently had profound success in Poker. Flowever, 
solving games for equilibria remains a mathematical and compu¬ 
tational challenge, especially in sequential games with imperfect 
information. In this pape r, we propose (our detailed contributions 
are sketched in subsection [L3] below and elaborated in section 0 a 
simple primal-dual algorithm for solving for such equilibria approx¬ 
imately (in a sense to be made precise in DefinitionBlbelow). 

1.1 Statement of the problem 

The sequence-form representation for two-person zero-sum 
games with incomplete information was introduced in m, and the 
theory was further developed in mmiD where it was established 
that for such games, there exist sparse matrices A £ R niXn2 , 
E i £ R ilXni , E 2 £ R' 2Xr * 2 , and vectors ei £ R z \e 2 £ R* 2 
such that ni, n 2 , li, and l 2 are all linear in the size of the game tree 
(number of states in the game) and such that Nash-equilibria cor¬ 
respond to pairs ( x, y) of realization plans which solve the primal 
LCP (Linear Convex Program) 

minimize (ei,p) subject to: y > 0, E 2 y = e 2 , 

-Ay+ Ejp > 0, 

and the dual LCP 

maximize — (e 2 ,q) subject to: x > 0, E\x = ei, 

(a:,q)xM^2 (2) 

A T X + E 2 q > 0. 


equals — 1 except for the first whose whose first entry is 1 and all the 
others are 0. Each of the vectors e k is of the form (I, 0,..., 0). 

The LCPs above have the equivalent saddle-point formulation 

minimize maximize (x. Ay), (3) 

ye Q 2 xech 


where the compact convex polytope 

Q k ■.= {z £ R nk \z > 0, E k z = e k } C [0, 1]"* (4) 

is identified with the strategy profile of player k in the sequence- 
form representation. At a feasible point (y, p,x,q) for the LCPs, the 
duality gap G(y, p , x, q) is given by[^] 

0 < G(y,p,x,q ) := (ei,p) - {-(e 2 ,q)) = (ei ,p) + (e 2 ,q) 

= G(x,y ) := max{(u, Ay) — (x,Av)\{u,v) £ Qi x Q 2 }. 

In 0, the quantity G(x, y) is nothing but the primal-dual gap for 
the equivalent saddle-point problem 0- It was shown (see Theorem 
3.14 of (§|) that a pair) x, y) £ Qi x Q 2 of realization plans is a 
solution to the LCPs 0 and 0 (i.e is a Nash-equilibrium for the 
game) if and only if there exist vectors p and q such that 


—Ay + Eip> 0, A T x + E%q > 0, {x, 
{y,A T x + E 2 q) = 0. 


-Ay + E^p) = 0, 


( 6 ) 


Moreover, at equilibria strong duality holds and the value of the 
game equals po = — qo, i.e the duality gap G(y,p,x,q) defined 
in 0 vanishes at equilibria. 

Definition 1 (Nash e-equilibria). Given t > 0, a Nash e-equilibrium 
is a pair (x *, y *) of realization plans for which there exists dual vec¬ 
tors p* and q* for problems 0 and 0 such that the duality gap at 
(y*, p *, x*, q*) doesn ’t exceed e. That is, 


0 <G(y*,p*,x*,q*)<e. (7) 


1.2 A remark concerning matrix games on simplexes 

It should be noted that any matrix A £ R 711 x ’ 12 specifies a ma¬ 
trix game with payoff matrix A, for which player k’s strategy profile 
is a simplex 



( 8 ) 


This simplex can be written as a compact convex polytope in the 
form j4j by taking E k := (1,1,..., 1) £ R lxrifc and e k = 1 £ R 1 . 
Thus every matrix game on simplexes can be seen as a sequential 
game, and so the results presented in this manuscript can be triv¬ 
ially applied such games in particular. For this special sub-class of 
sequential games, the duality gap function G(x, y) writes 


G(x, y) 


ma x{(u,Ay) - (x,Av)\(u,v) £ A ni x A„ 2 } 


max 

0<i<ni 


( A v)i - 


min 

0 <j<n 2 


(A t x) 0 . 


The vectorsp = (po,pi, ...,p i2 _i) £ R' 2 andg = (q 0 ,qi,q h -i) £ 
R Zl are dual variables. A is the payoff matrix and each E k is a ma¬ 
trix whose entries are —1, 0 or 1, with exactly 1 entry per row which 


The first inequality being due to weak duality. 


(9) 



1.3 Quick sketch of our contribution 

We now give a brief overview of our contributions, which will 
be made more elaborate in section [3] Developing on an alternative 
notion of approximate equilibria (see Definition0 homologous to 
that presented in Definition [T| we device a simple numerically sta¬ 
ble primal-dual algorithm that (Algorithm|T| for computing approxi¬ 
mate Nash-equilibria in sequential two-person zero-sum games with 
incomplete information and perfect recall. On, each iteration, the 
only operations performed by our algorithm are of the form A T x, 
Ay, Ei p, Ej q, and (a;)+ := (max(0, Xj))j. We also prove (Theo¬ 
rem [T} that -in an ergodic / Cesario sense- the number of iterations 
required by the algorithm to produce an approximation equilibrium 
to a precision t is 0{ 1 /e), with explicit values for the constants in¬ 
volved. 

1.4 Notation and terminology 

General. Let m and n be positive integers. The components of a 
vector 2 £ R™ will be denoted so, zi,..., z n -i (indexing begins from 
0, not 1). R+ := {z £ R n | 2 > 0} is the nonnegative nth orthant. 
|||| denotes the 2-norm of 2 defined by ||2|| := yj (2, 2). Given a 
matrix A £ R mxn , its spectral norm, denoted || A\\, is defined to be 
the largest singular value of A, i.e the largest eigenvalue of A T A (or 
equivalently, of 3L4 T ). 

Convex analysis. Given a subset C C R n , i c denotes the in¬ 
dicator function of C defined by ic(x) = 0 if a: £ C and + 
00 otherwise. At times, we will write i x ec for ic(x ) (to ease no¬ 
tation, etc.). For example, we will write i z >o for iig" (2), etc. The 
orthogonal projector onto C, is the “closest-point” map proj c : 

R n —> C, x argmin 1 1|2 — at|| 2 . Let / : R n —> (— 00 , +00] be a 
zee 

convex function. The effective domain of /, denoted dom(f), is de¬ 
fined as dom(f) := {x £ R”|/(a:) < +00}. If dom(f) 7 ^ 0 then 
we say / is proper. The subdifferential of / at a point x £ R” is de¬ 
fined by df(x) := {n £ R n |/(2) > f(x)+{v,z—x),\/z £ R 71 }. If 
/ is convex, its proximal operator is the function proxj. : R n —> R n 

defined by proxj(ir) := argmin ^\\z — x\\ 2 + f(z). 

ze*" 

We recommend mm for a more detailed exposition on convex 
analysis and its use in modern optimization theory and practice. 

2 Prior work 

Here, we present a selection of algorithms that is representative of 
the efforts that have been made in the literature to compute Nash e- 
equilibria for two-person zero-sum games with incomplete informa¬ 
tion like Texas Hold'em Poker, etc. It should be noted that the class 
of games considered here (sequential games with incomplete infor¬ 
mation), the LCPs 0 and 0 are exceedingly larger than what_state- 
of-the-art LCP and interior-point solvers can handle (see lfTTl]T2) ). 

2.1 Regret minimization 

CFR (CounterFactual Regret minimization) m, Monte Carlo 
CFR [U, and CFR+ j3j, by their large popularity, have become the 
definitive state-of-the-art, and are particularly useful in many-player 
games, since convex-analytical methods cannot help much in such 
games. Also, they can be shown to converge to a Nash-equilibrium 
provided each player uses a CFR scheme to play the game m, but 
have a much weaker convergence theory. For example, [2] showed 
that such schemes have a prohibitive running time of 0(l/e 2 ) to 
produce a Nash e-equilibrium. 

2.2 First-order methods 

In CD. a nested iterative procedure using the Excessive Gap 
Technique (EGT) 02) (EGT and precursors are well-known to the 
signal-processing community 1141 ) was used to solve the equilib¬ 
rium problem 0. The authors reported a 0( 1/e) convergence rate 
(which derives from the general EGT theory) for the outer-most 
iteration loop. CD proposed a modified version of the techniques 
in 02 and proved a O ((||j 4||/<S) log (1/e)) convergence rate in 
terms of the number of calls made to a first-order oracle. Here 


S = 5(A, £ 1 , i? 2 , ei, e 2 ) > 0 is a certain condition number for the 
game. The crux of their technique was to observe that 0 can further 
be written a the minimization of the duality gap function G(x,y) 
(defined in (|5)) for the gamt 0 viz 

minimize{G(a:, y)\(x, y) £ Qi x Q 2 }, (10) 

and then show there exists a scalar 5 > 0 such that for any pair of 
realization plans (x,y) £ Qi x Q 2 , 

“distance between (x, y) and set of equilibria” <G(x,y)/5. (11) 

Their algorithm is then derived by iteratively applying Nesterov 
smoothing HD with a geometrically decreasing sequence of toler¬ 
ance levels e 71 +i = £n /7 (with 7 > 1) G. However, there are a 
number of issues, most notably: (a) The constant <5 > 0 can be arbi¬ 
trarily small, and so the factor || A\\/8 in the O ((|| A||/<5) log ( 1 /e)) 
convergence rate can be arbitrarily large for ill-conditioned games. 
(b) The reported linear convergence rate is not in terms of basic 
operations (addition, multiplication, matvec, clipping, etc.), but in 
terms of the number of calls to a first-order oracle. Most notably, 
projections onto the polytopes Qk are computed on each iteration, a 
very hard sub-problem. 

Recently, liTSl proposed accelerations to first-order methods for 
computing Nash-equilibria (including those just discussed), by an 
appropriate choice of the underlying Bregman distance and the dis¬ 
tance generating fiinction (essential ingredients in EGT-type algo¬ 
rithms). These modifications provably gain a constant factor in the 
worst-case convergence rate over the original algorithm. 

2.3 Primal-dual algorithms 

The primal-dual algorithm first developed in OH, was proposed 
in ua as a way of solving matrix games on simplexes. Notably, 
such matrix games on simplexes are considerably simpler than the 
games considered here. Indeed, the authors in [T 8 | used the fact that 
computing the orthogonal projection of a point onto a simplex can be 
done in linear time as in fl9 1. In contrast, no such efficient algorithm 
is known nor is likely to exist, for the polytopes Qk defined in El. 
That notwithstanding, such projections can still be done iteratively 
using for example, the algorithm in proposition 4.2 of J20j or the 
algorithms developed in (ill . Unfortunately, as with any nested it¬ 
erative scheme, one would have to solve this sub-problem with finer 
and finer precision, rendering the overall solver impractical. One can 
also cite 1221 . in which the authors endeavored an iterative projection 
algorithm onto polytopes in outer representation. 

Other than the difficult projection sub-problem just discussed, 
the duality gap might explode even at points arbitrarily close to the 
set of feasible points, leaving the algorithm with no indication what¬ 
soever, on whether progress is being made. 

3 Our contributions 

3.1 Generalized Saddle-point Problem (GSP) equivalent 
for Nash-equilibrium LCPs 

In the next theorem, we show that the LCPs {Tl and 0 can be 
conveniently written as a GSP in the sense of mn he crux of idea 
is to remove the linear constraints in the definitions of the strategy 
poly topes Qk, by augmenting the payoff matrix to yield an equiva¬ 
lent saddle-point problem. The result is an equivalent game with un¬ 
bounded strategy profiles (nonnegative orthants) with much simpler 
geometry. We elaborate the construction in the following theorem. 

Theorem 1. Define two proper closed convex functions 

gi : R ” 2 x R (l (—00, +00], gi(y,p) :=i y >o + {ei,p)) 

g 2 : R " 1 x R ia -> (-00, +00], g 2 {x, q) := i x > 0 + (e 2 , q) J 

Also define two bilinear forms 4/ 1 , $ 2 : R " 2 xR 1 ' xR " 1 xR * 2 —> R 
by letting 


K := 


A -El 
E 2 0 


(y,P,x,q) 


x 

q 


,K 



, (13) 


2 The minimizers of G are precisely the equilibria of the game. 










with SE' 2 = —'Ll, and define the functions 'Ll, T^R " 2 xR * 1 xR " 1 x 
R * 2 —r (— 00 , +oo] by 


$i(y,p,x,q) 

ty 2 (y,p,x,q) 


('S'i{y,p,x,q)+gi(y,p), ify> 0 , 

} +oo, otherwise 

f^2{y,P,x,q) +g 2 (x,q), ifx> 0, 
l+oo, otherwise. 


(14) 


Finally, define the sets Si := R + 2 x R (l and S2 ■= R ” 1 x 
R* 2 , and consider the GSP('hi, V& 2 , g 1 , 32 )-' Find a quadruplet 
(y* ,p* ,x* ,q*) G Si x S2 s.t\/(y,p, x, q) G Si x S2, we have 

yi(y*,P*,x*,q*) < 4 >i(y,p,x*,q*), and 
^2{y*,p*,x*,q*) < i> 2 {y*,p*,x,q). 

Then GSPf^i, T+, g 1, 32) is equivalent to the LCPs iQ} and j 2 j, i.e 
a quadruplet ( 3 *, 3*, x*, q*) G R n2 x R* 1 x R ni x R 2 solves the 
LCPs |T} and <j 5 ]l iff it solves GSPfili, ^2, 31. 32). 

Proof. It suffices to show that at any point (3, p, x, q) G ST x S 2 , the 
duality gap between the primal LCP iQ} and the dual LCP { 2 } equals 
the duality gap of GSP( 4 /i, ^2, 3i, 32)- Indeed, the unconstrained 
objective in 0, say a(x, y), can be computed as 

a (y,p) = ( e liP) + iy>0 + i-Ay+Efp^o + tE 2 y=e 2 

= gt(y,p) + max (x, Ay - Eip) + max (q , E 2 y - e 2 ) 

x'>0 q' 

= 31 ( 3 , 3 ) + max ( x',Ay } - (x ,Eip) + (q',E 2 y) 

x' ,q' 

— {ix ’>0 + (e2, q }) 

= 3i(3, P) ~ min ^2{y,P,x' ,q) + 32 ( 2 + 3 ') 

x' ,q' 

= gi{y,p) - min i' 2 (y,p,x',q) = gi(y,p) - <(> 2 ( 3 , 3 )- 

x' ,q' 

v „-' 

02 (y,p) 


Similarly, the unconstrained objective, say b(x , q), in the dual LCP 
0 writes 

b(x, q) = —{q , e 2 ) — i x > 0 — ®a t x+eJ g>o ~ tE 1 x=e 1 

= —g 2 ( 2 ;, q) + min (y , A T x + En q) + min (p , ei - Eix) 
y'> 0 p' 

= - 32 ( 21 , 3 ) + min ^1 ( 3 ', p, x, q) + 31 ( 3 ', 3 ') 
y ,p 

= ~g 2 {x,q) + min (Pi ( 3 , 3 , x, q) = -g 2 {x,q) + <(> 1 ( 21 , 3 ). 

v ,p 

- v _✓ 

<t>l(x,q) 

Thus, noting that —00 < 4>i(x, q), <(> 2 ( 3 , 3 ) < +00 (so that all the 
operations below are valid), one computes the duality gap between 
the primal LCP 0 and dual the LCP 0 at ( 3 , 3 , x, q) as 

a (y,p) - b(x,q ) = 3i ( 3 , p) - <(> 2 ( 3 , 3 ) + 32(*, 3 ) - fii (x,q) 

= (3,3, x,q) + 3i(3,3) - <(>2(3,3) + ’£' 2 ( 3 , 3 ,+ 3) + 32 (+ q) 

- Mx,q) 

= ^ 1 ( 3 ,3, x,q) - <(> 1 ( 21 , 3 ) + £ 2 ( 3 , 3 , x, 3 ) - <(> 2 ( 3 , 3 ) 

= duality gap of GSP(T<i, T > 2 , 31 , 32 ) at ( 3 , 3 , a;, 3 ), 
where the second equality follows from 'Pi + $2 := 0 . □ 

By Theorem]!] solving for a Nash-equilibrium for the game is 
equivalent to solving the GSP {H- which as it turns out, is simpler 
conceptually (e.g, we no longer need to compute the complicated or¬ 
thogonal projections proj^^). The rest of the paper will be devoted 
to developing an algorithm for solving the latter. 


3.2 The proposed algorithm 

We now derive the algorithm (Algorithm[I]l for computing Nash 
(e, 0)-equilibria and establish its theoretical pro per ties. The algo¬ 
rithm, which emerges as a synthesis of Theorem]]] above and ideas 
from (23), is numerically stable and performs only basic iterations 
(i.e matvec multiplications, clipping, etc., and no calls to external 
first-order oracles, no matrix inversions, etc.). 

Definition 2 . Given e > 0 and a function f : R n — > (—00, +00], 
the e-enlarged subdifferential {or e-subdifferential, for short) of f is 
the set-valued function defined by 

d € f(x) := {+ G R n \f(z) > f(x) + {v, z — x) — e,Vz G R n }. (16) 

The idea behind e-subdifferentials is the following. Say we wish 
to minimize a convex function /. Replace the usual necessary and 
sufficient condition “0 G df(x)’’ for the optimality of x with the 
weaker condition “ d e f(x ) contains a sufficiently small vector v”. 
This approximation concept for subdifferentials yields yet another 
notion of approximate Nash-equilibrium. the following concept of 
approximate Nash-equilibria (refer to (23)), namely 

Definition 3 (Nash (ei, £ 2 ) -equilibria). Given tolerance levels 
£ 1,62 > 0, a Nash (ei, t2)-equilibrium for the GSP 03 is any 
quadruplet [x* ,3* ,2;* ,3*) for which there exists a perturbation 

vector v* such that ||u*|| < ei and v* G d t2 [’Ll(.,.,**, 3 *) + 
\Po(3*,3*,., .)](3*,3*, x* , 3*). Such a vector v* is called a Nash 
(ei, t 2 )-residual at the point (a;*, 3*, x* , 3*). 

The above definition is a generalization of the notion of Nash- 
equilibria since: (a) exact Nash-equilibria correspond to Nash (0, 0)- 
equilibria, and (b) Nash e-equilibria (in the sense of Definition IT} 
correspond to Nash (0, e)-equilibria. 


Algorithm 1 Primal-dual algorithm for computing Nash (e, 0 )- 

equilibria in two-person zero-sum sequential games 

Require: e> 0; {y ( 0 ) ,p (0) , x w , q (0) ) GR " 2 xR !i xR " 1 xR' 2 . 

Ensure: A Nash (e, 0 )-equilibrium (3*,3* ,x* ,3*) G Si x S2 for 
the GSP ( 13 } . 

l: Initialize: A +- 1/||A'||, v +- 0, k «— 0 

2: while k = 0 or A. || > e do 

3 : 3 ^ +1 > «- ( 3 « - A (A T x^ + EU W ))+, P (k+1) «- 

p (k) - A(ei - Eix (k) ) 

4 : z (fe+1) +- (x {k) + \(Ay( k+1) - Elp (k+1) ))+, 

Ax( fc + 1 ) 4— a:^ fc+1 ^ — x ^ 

5 : Aq( k+1) +- A (E 2 y - e 2 ), 3 (fc+1) +- q (k) + A3 (fc+1) 

6: 3 (fe+1) +- 3 (fc+1) - \{A T Ax (k+1) + E%Aq (k+1) ), 

A 3^ +1 > +- y^+V - yW 

7 : 3 (fe+1) +- 3 (fc+1) + XEiAx (k+1 \ Ap (k+1) 4— 3 (fc+1) - 

p W 

8: u (fe+1) ^v (k) + (Ay (k+1) , Ap (k+ 1 ) ,Ax (k+ 1 \Aq (k+1) ) 

9 : k 4 — k + 1 

10: end while 


Theorem 2 (Ergodic / Cesario 0(l/e) convergence). Let do be the 
euclidean distance between the starting point ( 3 ^, rfif ,x^°\ 3 10 ^) 
ofA/gorithmjlJand the set of equilibria for the GSP )15} . Then given 
any e > 0 , there exists an index fco < 2da f i '' i such that after ko iter¬ 
ations the algorithm produces a quadruplet (y k °, p k °, x k °, q k °) and 
a vector v k ° such that \\v k °\\ < e and v k ° G cl[T'i(.,., x k °, 3 fc °) + 
# 2 (y k °, p ko , , .)] (y k °, p k °, x k °, q k °), where 


v 


(>=o) ._ 
a 



(17) 


Thus AlgorithmUjoutputs a Nash (e, 0)-equilibrium for the GSP GU 
in at most iterations. 












Proof. It is clear that the quadruplet gi, (72) satisfies as¬ 

sumptions B.l, B.2, B.3, B.5, and B.6 of [23] with L xx = L yy = 0 
and L xy = L yx = [|if||. Now, one easily computes the proxi¬ 
mal operator of gj in closed-form as prox Aff . (a, b) = ((ffl) + , b — 

A ej). With all these ingredients in place. Algorithm [l| is then ob¬ 
tained from (23j Algorithm T-BD] applied on the GSP TI5|l with the 
choice of parameters: o = 1 G (0,1], a x = a y = 0 £ [0,cr), 
Ky ■= ^- y \/(p 2 -<ri)(a 2 - erg) = a/\\K\\ = 1/\\K\\, and 

A = \ xy G (0, Aa, y ]. The convergence result then follows immedi¬ 
ately from (23] Theorem 4.2], □ 

3.3 Practical considerations 

Efficient computation of Ay and A r x. In Algorithms [I] most of 
the time is spent pre-multiplying vectors by A and A T . For flop- 
type Poker games like Texas Hold’em and Rhode Island Hold’em, 
A (and thus A T too) is very big (up 10 14 rows and columns!) but 
is sparse and has a rich block-diagonal structure (each block is itself 
the Kronecker product of smaller matrices) which can be carefully 
exploited, as in CD. Also the sampling strategies presented in the 
recent work CD (section 6), for generating unbiased estimates of 
Ay and A T x would readily convert Algorithm [T| into an online and 
much scalable solver. 

Computing ||Aj|. A major ingredient in the proposed algorithm 
is ||-Kj|, the 2-norm of the huge matrix K. This can be efficiently 
computed using the power iteration. Also since ||fT|| is only used 
in defining the step-size A := 1/||A'||, it may be possible to avoid 
computing \\K || altogether, and instead use a line-search / backtrack 
strategy (see 1241 , e.g) for setting A. 

Game abstraction. For many variants of Poker, there has been 
extensive research in lossy / lossless abstraction techniques (for ex¬ 
ample E3 and more recently, CUED), wherein strategically equiv¬ 
alent or not-so-different situations in the game tree are lumped to¬ 
gether. This can drastically reduce the size of the state space from a 
player’s perspective, and ultimately, the size of the matrices A, E \, 
and E 2 , without significantly deviating much from the true game. 

4 Numerical experiments results 

We now present some proof-of-concept for the algorithm proposed. 
Results are presented and commented in Figure]!] 

Remark 1 . We have not benchmarked our algorithm against the 
algorithms proposed in and Gilpin’s et al. S 3 because im¬ 
plementing them from scratch for such games would require us to 
compute the complicated projections proXg fc . We recall that avoid¬ 
ing these projections was one of the goals of the manuscript. 

4.1 Basic test-bed: Matrix games on simplexes 

As in mm, we generate a 1000 x 1000 random matrix whose 
entries are uniformly identically distributed in the closed interval 
[—1,1]. The results of the experiments are shown in Figure [TJa). 

4.2 Kuhn Poker, a “toy” sequential game 

This game is a simplified form of Poker developed by Harold W. 
Kuhn in 1281 . It already contains all the complexities (sequentiality, 
imperfection of information, etc.) of a full-blown Poker game like 
Texas Hold’em, but is simple enough to serve as a proof-of-concept 
for the ideas developed in this manuscript. The deck includes only 
three playing cards: a King, Queen, and lack. One card is dealt to 
each player, then the first player must bet or pass, then the second 
player may bet or pass. If any player chooses to bet the opposing 
player must bet as well (’’call”) in order to stay in the round. After 
both players pass or bet, the player with the highest card wins the 
pot. The pair of vectors (x* ,y*) G R 13+13 given by 

x* = [1, .759, .759, 0, .241,1, .425, .575,0, .275, 0, .275, .725] T , 
y * = [1,1, 0, .667, .333, .667, .333,1, 0,0,1, 0, if 


is a Nash (10~ 4 , 0)-equilibrium computed in 1500 iterations of Al¬ 
gorithm IT] The convergence curves are shown in Fig |T] One easy 
checks that this equilibrium is feasible. Indeed, one computes 

E\x* - ei = [4.76 x 10” 5 , -1.91 x 1(T 5 , 5.67 x 1CT 5 , 8.23 x 
10 -6 , 2.90 x 10 -5 , —8.62 x 10 _7 ,-1.96 x 10~ 5 ] T and E 2 y* - 
e 2 = [-7.04 x 10“ 7 ,2.27 x 10 _6 ,-3.29 x 1CT 6 ,-1.50 x 
10~ 6 , 2.92 x 10“ 6 , -4.97 x 10 -7 , -5.85 x 1(T 7 ] t . 

Finally, one checks that x* T Ay* = —0.05555, which agrees to 
5 d.p with the value of —1/18 computed analytically by H. W. 
Kuhn in his 1950 paper [28 j. The evolution of the dual gap and the 
expected value of the game across iterations are shown in Figure [7] 
The results of the experiments are shown in Figure |T|'fo). 



(a) 10 3 x 10 3 matrix game on simplexes (b) Kuhn 3-card Poker 




Fig. 1: Con verg ence curves of Algorithm [I] We stress that the algorithms 
of Nesterov [ f 5 J and Gilpin 02 are included in the plots only indicatively, 
since this is not meant to be a Benchmark as already explained (RemarkfT). 
In (a), the duality gaps are computed according to formula f 9 l. One can see 
the linear (i.e exponentially fast) behavior of the algorithm in Tl 2 l . inbetween 
consecutive breakpoints on the e grid (though the rate of linear convergence 
seems to by quite close to 1 here ). As expected, the first-order smoothing al¬ 
gorithm laBelled “Nesterov” 02 jitters around as the iterations go on because 
even the smoothed problem becomes heavily ill-conditioned near solutions. 
(b)\ Kuhn Pok er. In the top-right plot, we show the modified duality gap 
defined in G 2 > In both cases, we see that the proved convergence rate for our 
algorithm is empirically observed. 


5 Concluding remarks and future work 

Making use of the sequence-form representation Him we have 
devised a simple numerically stable primal-dual algorithm for com¬ 
puting Nash-equilibria in two-person zero-sum sequential games 
with incomplete information (like Texas Hold’em, etc.). Our algo¬ 
rithm is simple to implement, with a low constant cost per iteration, 
and enjoys a rigorous convergence theory with a proven 0(1/e) 
convergence in terms of basic operations (matvec products, clip¬ 
ping, etc.), to a Nash (e, 0)-equilibrium of the game. In future, 
we plan to run more experiments on real Poker games to measure 
the practical power of the proposed algorithm compared to other 
competed schemes like CFR and EGT. 

In conclusion, Nash-equilibrium problems are saddle-point 
convex-concave problems, and as such, a natural tool for tackling 
them would be proximal primal-dual / operator-splitting algorithms, 
and we believe such methods will receive more attention in the 
algorithmic game theory community in future. 
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